-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Implement dynamic capacity for kubernetes task runner #18591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Implement dynamic capacity for kubernetes task runner #18591
Conversation
| @Override | ||
| public boolean isSidecarSupport() | ||
| { | ||
| return staticConfig.isSidecarSupport(); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
KubernetesTaskRunnerStaticConfig.isSidecarSupport
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
Outdated
Show resolved
Hide resolved
...d-extensions/src/test/java/org/apache/druid/k8s/overlord/taskadapter/K8sTaskAdapterTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, can remove the deprecated defaultIfNull calls.
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Show resolved
Hide resolved
...ensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerEffectiveConfig.java
Outdated
Show resolved
Hide resolved
|
@GabrielCWT , thanks for putting this together. I might be mistaken but I am not entirely sure if the task capacity of a cluster should be a dynamic config. Changing the task capacity should not be a very frequent requirement for any cluster. Could you please elaborate on why you feel the current setup is not adequate? |
Restarting the Overlord can be time-consuming and potentially risky as we could face issues when trying to redeploy the Overlord instance. Updating the task capacity dynamically allows admins to adjust the cluster safely, reducing operational downtime and complexity. While changes to task capacity are infrequent, I feel that providing a safer runtime option would help to minimize disruptions. |
I think it's true that changing the task slot for middle manager requires the restart as we might need a redeployment of middelmanager to some servers with bigger resources. but for K8S-based task scheduling, the resources is allocated at K8S side which is out of druid, restarting of the overlord does not make any sense, we should have the ablity to reload the capacity dynamically. As @GabrielCWT has stated above, restarting overlord is a heavy and risky operation in production. |
|
Thanks for the responses, @FrankChen021 , @GabrielCWT . Even though with the K8s task runner, the task pods are not technically a part of the Druid cluster, the Overlord still has to manage those tasks. The KubernetesTaskRunner itself keeps separate threads for each running task to track their status (this PR also updates that thread count, if I am not mistaken). So I should imagine that a major change in task capacity would also require some kind of scaling of the Overlord itself. Also, why is the Overlord restart "risky" or even "slow"? Doesn't a version upgrade require an Overlord restart too? |
In a k8s deployment, after increasing the task capability, increasing the overlord resources may not be always needed. For example, we generally set the cpu limit to a higher value while keeping the cpu request relatively low at the initial deployment, when capacity is increased, no need to increase the cpu resource. I mean risky because overlord needs to restore all tasks, previously we had some problems (maybe bug) that after switching leaders, overlord failed to elect a new leader. We try our best not to restart coordinator/overlord in production. |
Yes, there might be some bugs around that. Also, the K8s task runner makes certain list pod calls, which are pretty heavy
Oh, how frequently do you upgrade your cluster? I agree that K8s task runner is buggy and we should improve upon it. Instead, we should trying to fix up the actual problems in the task runner which make Overlord leader switch erroneous. What are your thoughts, @FrankChen021 ? |
We don't upgrade clusters very frequently, may be once a year or more than 1 year. We do adjust the capacity (upsize or downsize) regularly based on load/requirement.
The main idea of dynamic configuration is not to circumvent problems at restarting phase, it's about reducing the operation complexity and saving time. even restarting overlord is smooth, I don't think changing such configuration requires a restart from users/operators' view. for static configurations, operators have to change configuration files, sync files to kubenetes, restarting components, it's a heavy work flow. |
|
Thanks for the clarification, @FrankChen021 ! I am just a little apprehensive since the K8s task runner is already pretty buggy. I haven't gone through the whole PR yet. Will try to do a thorough review today. |
|
Hi @kfaraz are u reviewing this PR? I hope we can merge it into druid 35. |
| } | ||
| } | ||
|
|
||
| private void syncCapacityWithDynamicConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is not maintainable.
Can we do this in a seperate thread instead of calling sync everywhere.
Is the currentCapacity thread safe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to make currentCapacity to be thread safe.
Can we do this in a seperate thread instead of calling sync everywhere.
This was an alternative solution though I am not sure if we are thinking of the same implementation. My solution was to have a thread which periodically checks whether there has been any changes to the capacity and update currentCapacity accordingly. However, I wasn't sure if it was a waste of resources as there would need to be a trade off between responsiveness (how quick will the changes be visible) and resources (since we need to wake the thread up every X seconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a separate thread to check is also a bad idea , it does not solve the real problem here but increase the complexity.
the real problem here is that the config manager does not provide a notification mechanism when it detects configuration changes.
If we look at the config manager implementation, it provides a method swapIfNew to check and set new values. This is a place where we can add notification.
I think we can add a new overridden watch method which accepts a Runnable as callback.
This callback is kept in the internal ConfigHolder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added observer support for ConfigHolders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you the watcher based approach looks much more cleaner.
Yes, @FrankChen021 , I will go through the changes either today or tomorrow. |
|
@kfaraz any comments? |
|
This pull request hasn’t been merged into the master branch yet, which means it won’t be included in the upcoming Druid 35 release. It will most likely be part of Druid 36 once it’s reviewed and merged. |
Hey @FrankChen021 , yeah, still planning on going through this PR. Will try to prioritize it for later this week. |
| if (dynamicConfigSupplier == null || dynamicConfigSupplier.get() == null || dynamicConfigSupplier.get().getCapacity() == null) { | ||
| return staticConfig.getCapacity(); | ||
| } | ||
| return dynamicConfigSupplier.get().getCapacity(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wont this be bit weird for the cluster admin.
In the static configs, we supply lets say 10, but in the dynamic config we supply 20. How would the cluster admin reason about this ?
How are these configs reconciled ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You raise a good point. I don't think there is any way around this issue. I have updated the docs to state that the static config value will be overridden in the event that a dynamic config has been set. Hopefully this would add a little more clarity for users.
I am open to suggestions to make this even clearer.
| } | ||
| } | ||
|
|
||
| private void syncCapacityWithDynamicConfig(KubernetesTaskRunnerDynamicConfig config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much cleaner. Thanks @FrankChen021 for the suggestion and @GabrielCWT for the impl.
| } | ||
| } | ||
|
|
||
| private void syncCapacityWithDynamicConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you the watcher based approach looks much more cleaner.
| this.currentCapacity = config.getCapacity(); | ||
| this.tpe = new ThreadPoolExecutor(currentCapacity, currentCapacity, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(), Execs.makeThreadFactory("k8s-task-runner-%d", null)); | ||
| this.exec = MoreExecutors.listeningDecorator(this.tpe); | ||
| configManager.addListener(KubernetesTaskRunnerDynamicConfig.CONFIG_KEY, StringUtils.format(OBSERVER_KEY, Thread.currentThread().getId()), this::syncCapacityWithDynamicConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this run in the jetty thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it will run in the same thread as the one handling the API call. Would your concern be that if more listeners were to be added in the future, the updating of dynamic configs would be a long blocking call?
| // currently worker categories aren't supported, so it's hardcoded. | ||
| protected static final String WORKER_CATEGORY = "_k8s_worker_category"; | ||
|
|
||
| private int currentCapacity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be made thread safe. Use Atomic int.
Description
The aim of the PR is to enable changing
capacityfor theKubernetesTaskRunner. This would be done through the existing POST API/druid/indexer/v1/k8s/taskrunner/executionconfig.K8 configuration changes
In order to do this, I have added a new interface
KubernetesTaskRunnerConfigand renamed the existing config toKubernetesTaskRunnerStaticConfig. The interface will be implemented by the existing static config and a newKubernetesTaskRunnerEffectiveConfigwhich will be a wrapper class to encapsulate both the dynamic and static configs.The effective config will fall back to the static config's capacity if the dynamic config has not been set.
Changes to
/druid/indexer/v1/k8s/taskrunner/executionconfigbehaviourThe API will now take a new
capacityfield. On top of this, the fields will now be optional. If any field isnullor not passed, we will use the existing dynamic config values.Release note
New
capacityfield for/druid/indexer/v1/k8s/taskrunner/executionconfigPOST API. It will change the capacity forKubernetesTaskRunner.Challenges
In order to update the capacity for the task runner, I am calling a new function
syncCapacityWithDynamicConfigbefore every task is run in order to update the thread pool to the newest config.The issue with this is that any changes by the user will not be immediately reflected on the web console's homepage under the "Tasks" widget. The "task slots" would only be updated after a new task has been run.
I could not find a way to add a callback to the updating of dynamic configurations and felt that having a check every few seconds to see if the dynamic configuration had been updated was unnecessarily complex. I am open to suggestions if there are better ways to update the task runner.